Population clustering based on copy number variations detected from next generation sequencing data
نویسندگان
چکیده
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
منابع مشابه
I-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملNGS catalog: A database of next generation sequencing studies in humans.
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc...
متن کاملDiscovering CNVs from read depth analysis of next generation sequencing data
Genomic Structural Variations (SV), including Copy Number Variations (CNV), are believed to contribute significantly to variations between human individuals, and may have larger effect on phenotype than SNPs [1, 2]. Therefore, the importance of their discovery has been realized. While originally detected from analysis of aCGH array data, SVs/CNVs can now be more efficiently discovered from next...
متن کاملSMASH, a fragmentation and sequencing method for genomic copy number analysis.
Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-gene...
متن کاملPolygenic Versus Monogenic Causes of Hypercholesterolemia Ascertained Clinically.
OBJECTIVE Next-generation sequencing technology is transforming our understanding of heterozygous familial hypercholesterolemia, including revision of prevalence estimates and attribution of polygenic effects. Here, we examined the contributions of monogenic and polygenic factors in patients with severe hypercholesterolemia referred to a specialty clinic. APPROACH AND RESULTS We applied targe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of bioinformatics and computational biology
دوره 12 4 شماره
صفحات -
تاریخ انتشار 2014